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Cellular functions are carried out by complexes of coordinately functioning proteins. Understanding 
genome organization and gene functions in diverse organisms can reveal new insights into the evolution 
of the coordinated gene expression mechanisms. It is suggested that gene sets of different species may 
be mostly similar, while regulatory mechanisms of the gene expression are expected to be evolved in 
the species-specific manner. In this work, the genome-wide peculiarities of organization, transcription 
and evolution of 5 fish and 4 crustacean species were explored. The interspecies BLAST comparison 
of annotated protein sets revealed that inter-species protein diversity of crustaceans varies in much 
wider range. Moreover, in some cases, comparing with the crustaceans-crustaceans homology, the crus- 
taceans-fish protein conservation seems to be higher. A search for possible traces of the mitochondrial 
DNA (mtDNA) in the nuclear genome of 8 crustacean and fish species discovered that only one crusta- 
cean and one fish species (Armadillidium vulgare and Carassius auratus) have quite long (>500 bp) in- 
sertions of mtDNA in the nuclear genome, including the almost complete insertion of the organelle DNA 
in the С. auratus nuclear genome. Exploring the promoter architecture of 8 crustacean and fish nuclear 
genes revealed that (1) most of protein genes have, at least, one putative bidirectional promoter, and (2) 
hundreds of genes in these genomes are organized closely in the Head-to-Head manner with a potential 
BDP between them. It is concluded that BDPs may play a key role in the coordinated transcription of 
the crustacean and fish genes involved in the same cellular processes. 


Keywords: Fish, crustaceans, mtDNA, bidirectional promoter, Head-to-Head genes 


INTORDUCTION 


Our current knowledge suggests that the 
aquatic vertebrates have been evolved from heavy 
benthic microphages to floating, mobile, and om- 
nivorous. Vertebrates including the first fishes 
were probably originated about 530 million years 
ago during the Cambrian explosion. The evolution 
of fishes seems to occur in freshwater, while crus- 
taceans mostly evolved in marine habitats 
(Wagele, 1992). The bony fish species (Osteich- 
thyes) with about 27 000 living species represent 
more than 50% of all known vertebrate species 
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(Spaink et al., 2013). Crustaceans (crabs, lobsters, 
crayfish, shrimps, prawns, krill, woodlice and bar- 
nacles) diversified over the 455 million years ago 
form a large, diverse arthropod taxon. To date, 
67,000 crustacean species have been described. 
Decapods (crabs, shrimp and lobsters), the most 
definitely recognizable crustaceans, include over 
15 000 living and 3000 fossil species from 233 
families (Wolfe et al., 2019; Stillman et al., 2008). 

Cellular functions are carried out by complexes 
of coordinately functioning proteins. Therefore, the 
evolution of closely related species could involve 
the coordinately gains or losses of such gene groups. 
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Therefore, understanding genome organization and 
gene functions via comparative genomic and prote- 
omic studies in extremely diverse organisms can re- 
veal new insights into the evolution of the coordi- 
nated gene expression mechanisms (Martin and Fra- 
ser, 2018). In particular, it is supposed that gene 
(protein and non-coding RNA) sets of different spe- 
cies may be mostly similar, while regulatory mech- 
anisms of the gene expression are expected to be 
evolved in the species-specific manner. The com- 
parative genomic studies are very important for 
both fundamental science and practical use (the de- 
velopment of new medicines, sustainable aquacul- 
ture strategies, etc). In the last years, the great ad- 
vances in genomic studies of bony fish species and 
some crustaceans were achieved (Spaink et al., 
2013; Martin and Fraser, 2018). To date, the nu- 
clear genomes of about 300 fish and over 10 crus- 
tacean species have been sequenced 
(https://www.ncbi.nlm.nih.gov/genome/?term=an- 
imals). 

In general, all known eukaryotic genetic sys- 
tems consist of the nuclear genome and semi-au- 
tonomous mitochondrial genome; plants have also 
the plastid genome. Mitochondrial functions were 
conserved almost in all eukaryotes studied and it 
supposed that these organelles are of the endosym- 
biotc a-proteobacterial origin. The mitochondrial 
genome encodes only about 10% its proteins and 
most of mitochondrial functions are encoded in the 
nuclear genome, synthesized in the cytosole and 
transported to the organelles. In comparison with 
plants, animals have smaller mitochondrial ge- 
nome (Adams et al., 2002; Burger et al., 2003; 
Herrmann et al., 2003). The current findings sug- 
gest that during the evolution most of the organel- 
lar genes were transferred to the nucleus and, to 
date, such a transfer process was discovered mostly 
in plants (Shahmuradov et al., 2003; Barbrook et 
al., 2006; Noutsos et al., 2007). Moreover, the or- 
ganelle-to-nucleus gene transfer seems to be pres- 
ently continued. Presently (Sheppard et al 2008). 

Transcription is the first, decisive phase of the 
genome expression. Genome transcription (when, 
where and how) is regulated by RNA polymerases, 
transcription factors and promoters. Protein coding 
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genes are transcribed by RNA Polymerase II (Pol П; 
Solovyev et al., 2010; Danino et al., 2015). Previ- 
ously, it was thought that promoters are unidirec- 
tional: they can initiate transcription only ona single 
strand of DNA. But, recently it was revealed that a 
single promoter can initiate transcription in both di- 
rections (Wei et al., 2011; Duttke et al., 2015; 
Bagchi and Iyer, 2016; Weingarten-Gabbay et al., 
2019). Moreover, it was found that most genes are 
transcribed from alternative promoters. Using alter- 
native promoters is regulated in a cell/tissue-specific 
manner, depending on the development stage and/or 
environmental signals. The alternative transcription 
initiation seems to be one of main principles of the 
RNA metabolism (Chen et al., 2016). 

In this work, the genome-wide peculiarities of 
organization, transcription and evolution of some 
fish and crustacean species were explored. Results 
of these studies are presented below. 


MATERIALS AND METHODS 


For the inter-species comparison of protein 
sets, promoter studies and search for possible splin- 
ters of mitochondrial DNA (mtDNA) in the nu- 
cleus 4 crustacean (Armadillidium vulgare, Eury- 
temora affinis, Hyalella azteca and Daphnia pulex) 
and 5 fish species (Carassius auratus, Neolampro- 
logus brichardi, Oryzias latipes, Salmo salar and 
Cyprinus carpio) with sequenced and annotated 
nuclear genome were selected 
(https://vvvvvv.ncbi.nlm.nih.gov/ge- 
nome/browse#!/overview/). 

The pairwise comparison of DNA and protein 
sequences was performed by BLAST package 
(Altschul et al., 1997) and BLAN computer pro- 
gram (I.Shahmuradov, unpublished). To search for 
the nuclear copies of the mtDNA sequences based 
on the BLAST comparison of nuclear and mito- 
chondrial genomes, we applied the TRANSFER 
computer program (I.Shahmuradov, unpublished) 

Search for bi-directional Pol П promoters 
(BDPs) and exploration of the mutual location of 
neighbor protein genes in a genome was done by 
TSShm and BDPGfinder computer programs (I. 
Shahmuradov, unpublished). 
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Table 1. Summary of the reciprocal interspecies BLAST comparisons of all annotated proteins from 4 crustacean 
and 5 fish species 
























































Avu® EafCr Haz" Орис СсаР8 CauFish NbrfFish OlaFish SsaFish 

Avu“, 19051* 199/246 | 430/437 |326/356| 224/652 221/990 212/332 193/399 230/857 
Eaf™, 30425* 246/199 277/229 | 492/400 | 342/733 269/1341 242/388 226/532 291/1037 
Haz, 22749“ | 437/430 | 229/277 415/407 | 248/693 217/1211 215/365 198/503 249/890 
Dpu“, 30611* | 356/326 | 400/492 | 407/415 418/1257 | 392/2225 372/750 350/997 403/2166 
Ceakish, 63928* 1652/224 | 733/342 | 693/248 |1257/418 43432/69140| 11469/8768 |11115/12174) 1578/29117 
Caufish, 96703* | 990/221 | 1341/269 | 1211/217 12225/3921691140/43432 17985/7352 | 16930/9402 |22184/15784 
NbrFish, 31372* |332/212 | 388/242 | 365/215 | 750/372 | 8768/11469 | 7352/17985 12284/15342) 9884/21312 
OlaFish, 44766* | 399/193 | 532/226 | 503/198 | 997/350 112174/111151 9402/16930 |15342/12284 12664/19639 
SsaFish, 97555“ | 857/230 | 1037/1037| 890/249 |2166/403|29117/15782|15784/22 184] 21312/9884 |19639/12664 





Species selected for pairwise BLAST comparison: Avu — А. vulgare, Eaf — Е. affinis, Haz — Н. azteca, Әри – Р. pulex, Cea - С. 
carpio, Cau — С. auratus, Nbr — №. brichardi, Ola — O. latipes, Ssa — S. salar. Cr: crustaceans. Full-length homology level: 


280%. * — number of analyzed protein sequences for every species 





The TSShm program searches for CpG, non- 
CpG /TATA and non-CpG/TATA-less promoters 
in animal DNA sequences. Depending on the pro- 
moter class, the TSShm has the highest prediction 
accuracy among analogous methods (90-98%). 
The BDPGfinder program identifies the close 
H2H gene pairs that may be associated with 
BDP(s) by analyzing the genome-wide TSShm re- 
sults and genome annotation files in the GenBank 
format. Hereinafter, a DNA region with a pair of 
TSSs on the opposite strands of DNA and at dis- 
tance less than 300 bp is termed as BDP. 


RESULTS AND DISCUSSION 


To study trends in evolution of fishes and crus- 
taceans, we performed the pairwise interspecies 
BLAST comparison of all annotated to date proteins 
from 4 crustacean and 5 fish species, including crus- 
taceans A. vulgare (19051 proteins), E. affinis 
(30425), H. azteca (22749) and D.pulex (30611), 
fishes С. auratus (96703), С. carpio (63928), N. 
(31372), O. latipes (44766) and S. salar (97555). 
Results of the analysis are summarized in the Table 
1. Although thousands of proteins from 5 fish spe- 
cies show the full-length high (280%) level evolu- 
tionary conservation, the inter-species protein diver- 
sity of crustaceans seems to be high. Moreover, in 
some cases, comparing vvith the crustaceans-crusta- 
ceans homology, the crustaceans-fish protein con- 
servation seems to be higher (see Table 1: marked 


in grey). The biological relevance of these findings 
remains to be further investigated. 

Further, vve performed a search for possible 
traces of the mtDNA in the nuclear genome of 4 
crustacean (4. vulgare, Е. affinis, Н. azteca and Р. 
pulex) and 4 fish species (C. auratus, N. brichardi, S. 
salar and C. carpio). For this purpose, an intra-spe- 
cies BLAST comparison of the DNA sequences of 
the nuclear and mitochondrial genomes for each 
species was performed, and the obtained BLAST re- 
sults were analyzed using BLAN and transfer pro- 
grams. Contrary to the previously discovered facts 
on higher plants (Adams et al. 2002; Shahmuradov 
et al. 2003; Shahmuradov et al., 2010), it was re- 
vealed that only one crustacean (A. vulgare) and one 
fish (С. auratus) species have long (2500 bp) inser- 
tions of mtDNA in the nuclear genome (Table 2). 





Table 2. mtDNA insertions in the nuclear genomes of 
analyzed crustacean and fish species 











Organism | Length of mitochondrial Length of the 
genome, bp mtDNA insertion, bp 
A.vulgare 13939 2545 
1168 
C.auratus 16580 15658 











In particular, for the first time, almost complete 
insertion of the mtDNA was found in the fish nu- 
clear genome. The results of this study and previous 
studies on the existence of the striking differences in 
the number and total length of DNA sequences of 
mitochondrial origin in the nuclear genomes suggest 
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that the Mitochondrion-to-Nucleus transfer in crus- 
tacean and fish species occurred after these species 
separated from a common ancestor. 

In any genome, the neighbor genes may be or- 
ganized in Tail-to-Head (T2H) or Tail-to-Tail 
(T2T) or Head-to-Head (H2H) manner (Fig. 1). In 
particular, the non-stop transcription of the closely 
located T2H neighbor genes may produce chimeric 
transcripts (proteins). The closely located H2H 
genes might coordinately transcribed from the 
BDP between them. The H2H fashion of location 
of close gene pairs seems to be very important in a 
sense of the coordinated transcription (expression) 
of genes via the BDP(s). 


Genel Gene 2 

T2H: oni 5 
Gene 1 Gene 2 

T2T: -—- че 
бепе 1 бепе2 


н2н: <a > 


Figure 1. Schematic presentation of the T2H, T2T and 
H2H genes. 


In this study, for 3 crustacean (А. vulgare, Е. 
affinis and Н. azteca) and 5 fish (С. carpio, С. au- 
ratus, N. brichardi, O. latipes and S. salar) species, 
we performed (1) the genome-wide analysis of the 
mutual location of the neighbor genes, (2) a search 
for putative BDPs for all annotated protein genes 


and (3) an identification of potential H2H gene 
pairs with BDP between them. Results of these 
studies are summarized in Table 3. Initially, using 
the TSShm tool, we performed search for CpG, 
non-CpG/TATA and non-CpG/TATA-less pro- 
moters in [-1000:+100] regions (+1 corresponds to 
the gene start) of these genes. Then, applying the 
BDPGfinder tool, we explored the putative BDPs 
for every gene analyzed. At least, 1 putative ВОР 
was identified for 73-83% of the protein genes in 8 
species (see Table 3) where most of these promot- 
ers belong to the CpG class (data not shown). 


13404885 


LOC113070208 






5 3 
3 y Chr1 
L0C113069938 


13404484 


Figure 2. Bi-directional CpG-island promoter between 
neighbor genes LOC113069938 and LOC113070208 
located on the opposite strands of the chromosome 1 of 
C. auratus (gold fish). These genes encode succinate- 
CoA ligase [ADP/GDP-forming] subunit alpha (mito- 
chondrial; protein ID: XP_026098902.1) and CDGSH 
iron-sulfur domain-containing protein 2-like (protein 
ID: XP_026099249.1), respectively. Inter-genes dis- 
tance — 401 bp, inter-TSSs distance - 196 bp. 


Table 3. Some peculiarities of the genomic organization and promoter architecture of protein coding genes in 3 crus- 
tacean and 5 fish species 











T2H T2T H2H BDPs BDPGs 
АзиС, 6152* 1975 /55 */0° 1073/20 3/58P 739/18°/66 5121; 83% 13 
EafC, 19743" 11504/615 */676? 3337/105 2/206° 3339/341 */555° 14935, 7670 300 
Haz“, 17842" 7890/408 /179° 3510/397 3/2385 3003/309 */140° 14425, 81% 303 
Сса"4, 244247 12986/272 */507° 5720/161 */279> 5718/637 3/5925 18785, 77% 550 
Саи", 396457 21063/700 “/3755 9311/701 4/755” 9293/1671 /960° 3096; 78% 1416 
Nbr“, 18486* 9298/157 /148° 4310/328 195° 4315/717 7/312° 14117; 76% 599 
Ola”, 22040* 11390/496/204> 5284/405 */613 5340/997 */831Р 17331; 79% 883 
Ssa’, 6719* 3517/124 9/78" 1610/91 */121Р 1606/224 */230° 4904; 73% 175 





* — number of analyzed protein genes for every species. * — pairs of genes located at distance of <1000 bp and 550 bp; № — number 
of overlapping gene pairs. The designation of the species is the same as in table 1. 
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At last, H2H pairs of the neighbor and non- 
overlapping genes at distance from 50 bp to 1000 
bp were investigated for an existence of putative 
BDP(s) between them. It was found that the num- 
ber of H2H genes with shared BDP varies in quite 
diapason in the crustacean and fish species ex- 
plored (Table 3). 

In particular, excepting the S. salar, all fish 
species are significantly enriched in H2H pairs 
with putative BDP. An example of the adjacent 
H2H pair with a putative BDP between them is il- 
lustrated in Fig. 2. Summarizing these results, it 
can be concluded that BDPs may play a key role of 
in the coordinated transcription of genes involved 
in the same cellular processes. 
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Bəzi balıq və xərçənq növlərinin genomlarının təşkili və təkamülünün müqayisəli tədqiqi 
А.О. Abduləzimoval, К.С. Qasımov?, M.Ə. Abbasov", T.A. Səmədova”, İ.Ə. Şahmuradov”? 


"AMEA-nın Molekulyar Biologiya və Biotexnologiya İnstitutu, Bakı, Azərbaycan 
2АМЕА-тп Biofizika Institutu, Baki, Azarbaycan 
"AMEA-nın Genetik Ehtiyatlar İnstitutu, Bakı, Azərbaycan 


Hüceyrə funksiyaları razılaşdırılmış şəkildə işləyən zülal kompleksləri tərəfindən həyata keçirilir. Müxtəlif 
orqanizmlərdə genomun təşkilini və gen funksiyalarını aydiniasdirilmaqla genlərin razılaşdırılmış expres- 
siyası müxanizmlərinin takamülündə yeni məqamları aşkar etməyə imkan verə bilər. Güman olunur ki, 
müxtəlif növlərdə gen dəstlərinin çoxu oxşar ola bilər, lakin gen ekspressiyasının tənizmlənmə mexanizm- 
ləri növ-spesifik yönümdə təkamül edir. Bu işdə 5 balıq və 4 xərçəng növündə genomun təşkili, transkrip- 
siyası və təkamül xüsusiyyətləri araşdırılmışdır. Tədqiq olunmuş xərçəngkimilər və balıqların annotasiya 
olunmuş zülal dəstlərinin növlərarası BLAST müqayisəsi xərçənglərdə zülalların müxtəlifliyinin daha ge- 
niş diapazonda dəyişdiyini aşkar etmişdir. Üstəlik, bəzi hallarda, xərçəng-xərçəng oxşarlığı ilə müqayisədə, 
xorcong-baliq zülallarının konservativlik dərəcəsi daha yüksəkdir. $ xərçəngkimilər və balıq növlərinin nü- 
və genomunda mitoxndri DNT-sinin (mtDNT) izlərinin axtarışı göstərmişdir ki, yalnız bir xərçəng və bir 
balıq növünün (Armadillidium vulgare va Carassius auratus) nüvə genomunda mtDNT-sinin uzun (2500 
nc) insersiyaları mövcuddur. O cümlədən, C. auratus növünün nüvə genomunda organella DNT-sinin, de- 
mək olar ki, bütöv insersiyası vardır. 8 xərçəng və balıq növünün nüvə genlərinin promotor arxitekturasını 
araşdırarkən, (1) zülal genlərinin çoxunun, ən azı, bir ikiistiqamətli promotorunun (İİP) olduğu və (2) bu 
genomlarda yüzlərlə cüt genlərin yaxın qonşuluqda Baş-Başa yerləşmişləri və onların arasında potensial 
İİP mövcuddur. Belə bir nəticəyə gəlinmişdir ki, İİP-lar xərçəng və balıqlarda eyni hüceyrə prosesində 
iştirak edən genlərinin razılaşdırılmış transkripsiyasında mühüm rol oynaya bilər. 


Açar sözlər: Balıq, xərçəngkimilər, mtDNT, ikitərəfli promoter, baş-başa genlər 
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Сравнительные исследования организации генома и эволюции некоторых 
видов рыб и ракообразных 


А.Ю. Абдулазимова', К.Г. Гасымов?, М.А. Аббасов?, Т.А. Самадова ?, И.А. Шахмурадов'* 


1 Институт молекулярной биологии и биотехнологий, HAH Азербайджана, Баку, Азербайджан 
2 Институт биофизики НАН Азербайджана, Баку, Азербайджан 
3 Институт генетических ресурсов НАН Азербайджана, Баку, Азербайджан 


Функции клетки осуществляются благодаря согласованной работе комплексов функционирующих 
белков. Уточнение организации генома и функций генов у различных организмов создаст возмож- 
ность для выявления новых аспектов в понимании эволюции механизмов скоординированной экс- 
прессии генов. Предполагается, что наборы генов разных видов могут быть в основном похожими, 
в то время как, механизмы регуляции экспрессии генов, как ожидается, будут развиваться видоспе- 
цифичным образом. В данной работе исследованы полногеномные особенности организации, тран- 
скрипции и эволюции 5 видов рыб и 4 видов ракообразных. Межвидовое сравнение аннотирован- 
ных наборов белков с помощью BLAST показало, что межвидовое белковое разнообразие ракооб- 
разных варьирует в гораздо более широком диапазоне. Более того, в некоторых случаях, по сравне- 
нию с гомологией ракообразные-ракообразные, консервация белка ракообразные-рыба оказывается 
более высокой. Поиск возможных следов митохондриальной ДНК (мтДНК) в ядерном геноме 8 ви- 
дов ракообразных и рыб показал, что только у одного вида ракообразных и одного вида рыб 
(Armadillidium vulgare и Carassius auratus) довольно длинные (2500 п.н.) вставки мтДНК в ядерный 
геном, включая почти полную вставку ДНК органелл в ядерный геном С. auratus. Изучение промо- 
торной архитектуры 8 ядерных генов ракообразных и рыб показало, что (1) большинство генов бел- 
ков имеют, по крайней мере, один предполагаемый двунаправленный промотор, (2) сотни генов в 
этих геномах организованы близко друг к другу в манере «голова к голове» с потенциальным дву- 
направленным промотором (ДНП) между ними. Сделано заключение, что ДНП могут играть клю- 
чевую роль в координированной транскрипции генов ракообразных и рыб, участвующих в одних и 
тех же клеточных процессах. 
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